AITopics

Neural Information Processing SystemsApr-24-2026, 20:32:15 GMT

15c00b5250ddedaabc203b67f8b034fd-Paper.pdf

machine learning, natural language, translation, (19 more...)

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsFeb-14-2026, 15:02:01 GMT

A Tensorized Transformer for Language Modeling

Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Ming Zhou, Dawei Song

Neural Information Processing Systems http://nips.cc/

multi-linear attention, tensor, transformer, (14 more...)

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Neural Information Processing SystemsFeb-11-2026, 21:06:55 GMT

f18a6d1cde4b205199de8729a6637b42-Supplemental.pdf

asymmetric multi-head attention, multi-head attention, original multi-head attention, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-7-2026, 14:55:04 GMT

DomainSequenceModeling

Wefurther propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequencemodeling.

machine learning, natural language, translation, (18 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Neural Information Processing SystemsNov-20-2025, 04:59:23 GMT

], and simulating complex circuits

In particular, we consider the sparse linear regression task, i.e., the data is generated from a

large language model, machine learning, natural language, (17 more...)

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Borde, Haitz Sáez de Ocáriz

Beyond Parallelism: Synergistic Computational Graph Effects in Multi-Head Attention

arXiv.org Artificial IntelligenceNov-11-2025

Yet, the theoretical advantages of multi-head versus single-head attention, beyond mere parallel processing, remain underex-plored. In this paper, we reframe multi-head attention as a system of potentially synergistic computational graphs, where each head functions as a feedforward directed acyclic graph (DAG) with a common sink state. We provide intuition and preliminary theoretical analysis of mixing time and minimax fidelity in this framework. Our results show that multi-head attention can synergistically enhance information propagation, yielding faster mixing times and minimax fidelity amplification under specific head-diversity conditions. Finally, we train single-head and multi-head Transformers, each with the same total number of parameters, on sequence manipulation tasks and empirically verify the predicted effects.

artificial intelligence, fidelity, machine learning, (12 more...)

2507.02944

Country: Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Chakraborty, Rajatsubhra, Espinosa-Momox, Ana, Haskin, Riley, Xu, Depeng, Porras-Aguilar, Rosario

DM-QPMNET: Dual-modality fusion network for cell segmentation in quantitative phase microscopy

arXiv.org Artificial IntelligenceNov-4-2025

ABSTRACT Cell segmentation in single-shot quantitative phase microscopy (ssQPM) faces challenges from traditional thresh-olding methods that are sensitive to noise and cell density, while deep learning approaches using simple channel concatenation fail to exploit the complementary nature of polarized intensity images and phase maps. We introduce DM-QPMNet, a dual-encoder network that treats these as distinct modalities with separate encoding streams. Our architecture fuses modality-specific features at intermediate depth via multi-head attention, enabling polarized edge and texture representations to selectively integrate complementary phase information. This content-aware fusion preserves training stability while adding principled multi-modal integration through dual-source skip connections and per-modality normalization at minimal overhead. Our approach demonstrates substantial improvements over monolithic concatenation and single-modality baselines, showing that modality-specific encoding with learnable fusion effectively exploits ssQPM's simultaneous capture of complementary illumination and phase cues for robust cell segmentation.

artificial intelligence, machine learning, segmentation, (18 more...)

2511.00218

Country: North America > United States > North Carolina (0.15)

Genre: Research Report (0.82)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceOct-28-2025

Knocking-Heads Attention

Zhou, Zhanchao, Chen, Xiaodong, Chen, Haoxing, Lan, Zhenzhong, Li, Jianguo

Multi-head attention (MHA) has become the cornerstone of modern large language models, enhancing representational capacity through parallel attention heads. However, increasing the number of heads inherently weakens individual head capacity, and existing attention mechanisms - whether standard MHA or its variants like grouped-query attention (GQA) and grouped-tied attention (GTA) - simply concatenate outputs from isolated heads without strong interaction. To address this limitation, we propose knocking-heads attention (KHA), which enables attention heads to "knock" on each other - facilitating cross-head feature-level interactions before the scaled dot-product attention. This is achieved by applying a shared, diagonally-initialized projection matrix across all heads. The diagonal initialization preserves head-specific specialization at the start of training while allowing the model to progressively learn integrated cross-head representations. KHA adds only minimal parameters and FLOPs and can be seamlessly integrated into MHA, GQA, GTA, and other attention variants. We validate KHA by training a 6.1B parameter MoE model (1.01B activated) on 1T high-quality tokens. Compared to baseline attention mechanisms, KHA brings superior and more stable training dynamics, achieving better performance across downstream tasks.

large language model, machine learning, natural language, (18 more...)

2510.23052

Country:

Europe (0.68)
North America > Mexico (0.28)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-21-2025

Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity

Wang, Tuowei, Li, Kun, Hao, Zixu, Bai, Donglin, Ren, Ju, Zhang, Yaoxue, Cao, Ting, Yang, Mao

The adaptation of pre-trained large language models (LLMs) to diverse downstream tasks via fine-tuning is critical for numerous applications. However, the inefficiency of parameter-efficient fine-tuning (PEFT) techniques presents significant challenges in terms of time investments and operational costs. In this paper, we first introduce a nuanced form of sparsity, termed Shadowy Sparsity, which is distinctive in fine-tuning and has not been adequately addressed for acceleration. Under Shadowy Sparsity, we propose Long Exposure, an efficient system to accelerate PEFT for LLMs. Long Exposure comprises three key components: Shadowy-sparsity Exposer employs a prolonged sensing range to capture more sparsity details under shadowy sparsity; Sequence-oriented Predictor provides efficient yet accurate predictions to handle large sequence inputs and constantly-evolving parameters; and Dynamic-aware Operator facilitates more structured computational patterns and coalesced memory accesses, addressing dynamic sparse operations. Extensive evaluations show that Long Exposure outperforms state-of-the-arts with up to a $2.49\times$ speedup in end-to-end fine-tuning, offering promising advancements in accelerating PEFT for LLMs.

large language model, machine learning, sparsity, (19 more...)

doi: 10.1109/SC41406.2024.00081

2510.15964

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Media > Photography (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)